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A NEW MEASURE OF DISPERSION* 

By Tbuman L. Kellet, Stanford University 



The measures of dispersion in common use are: (1) the standard 
deviation; (2) average deviation; (3) quartile deviation; and (4) the 
range. It is generally admitted that the merit of the range as a meas- 
ure of dispersion lies in its simplicity of determination and interpre- 
tation. To calculate its probable error has not been attempted; but 
were this done it would have little significance, for the range itself is not 
a constant function of a distribution, since, in general, it increases with 
an increase in population. The range is therefore unsatisfactory as a 
measure of dispersion, because it undoubtedly has a large probable 
error, and also because its significance varies as populations of different 
sizes are being considered. 

The probable errors of the other three measures are given in the sec- 
ond column of the accompanying table. The values in columns 2 and 
3 must be multiplied by the standard deviation of the distribution, and 
the values in columns 2_and 4 must be divided by the square root of 
the number of cases (Vn) before they will serve in any particular prob- 
lem. For purposes of comparison of different measures, these factors 
are here omitted. 

TABLE 



1 

Measure of 
dispersion (Z)) 


2 

Probable 

error of 

measure of 

dispersion 


3 

Measure of dis- 
persion in terms 
of tlie standard 

deviation! 

Range covered 

by measure in 

case of a normal 

distribution 


4 
The probable 
error of the 
measure of dis- 
persion divided 
by the measure 
of dispersion § 


5 

Population neces- 
sary to secure a 
reliability equal 
to that of the 
standard devia- 
tion when deter- 
mined from 100 
cases 




0.4769* 

0.4066 

0.5306 


1.0000 

0.7979t 

0.6745 

i;3496 

2;563i 
2.9640 
3.2897 


0.4769t 

0.6096 

0.7867 

4.721 

2.036 

1.378 

1.082 

0.906 

0.787 

0.702 

0.6001 

0.5905 

0.5964 

0.665 

0.756 


100 


Average deviation 


114 


Quartile deviation 


272 


Eange between percentiles: 
49 and 51 


9798 


45 and 55 


1822 


40 and 60 


833 


35 and 65 


515 


30 and 70 


360 


25 and 75 


272 


20 and 80 


216 


10 and 90 


158 


7 and 93 


153 


5 and 95 


156 


2 and 98 


196 


1 and 99 


251 







* Credit is due to William Beta and W. J. Osborn for the calculation of several of the probable errors 
reported in this article. 

t The values in column 2 should be mutiplied by <t and divided by v^to give the full definition. 
t The values in column 3 should be multiplied by <t. 
§ The values in column 4 should be divided by Vn. 
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The formula used for the calculation of the probable error of (1) the 

standard deviation is 0.6745 —y=. The following formulas for the 

V2n 

probable errors of (2) the average deviation and (3) the inter-percentile 
range have been derived by the writer from elementary principles: 

A.D. 

(2) Probable Error of the average deviation equals 0.5096 /- 

(3) Probable Error of the inter-percentile range equals 

0.6745 JuMl + "P^2 _ ^npiga 

in which pi and p2 are the proportions lying below the percen- 
tiles studied; e. g., if the probable error of the inter-quartile range is 
being determined, pi = 0.25 and p2 = 0.75 (pi being always the smaller of 
these two proportions). The g's are defined by the usual relationship 
Pi+gi = l- 5^1 is the ordinate of the distribution at the pi percentile, 
and may be directly read from the distribution (preferably after slight 
smoothing), but if the normal distribution is assumed, it can be de- 
termined from Sheppard's tables of the ordinates of the normal prob- 
ability integral. 

In case it is desired to obtain the probable errors of distances 
between percentiles leaving equal amounts of area at each end of 
the distribution, the formula becomes for the normal distribu- 

0.6745<7 . . 

tioQ, P.E.D= 7=—- V2i9,o,— 2pi2 ui which D is the distance be- 

tweenthe percentiles lOOpi and 100 (1— pi), o- is the actual standard 
deviation of the distribution, and z is the entry for the ordinate in 
Sheppard's tables corresponding to the proportion pi. The formula for 
the probable error of this inter-percentile range is not in convenient 
shape for actual calculation, for it contains a, which is usually un- 
known. As, however, the relationship between D and <t in the case of a 
normal distribution can be immediately obtained by referring to Shep- 
pard's tables for any proportion, p, the formula can be readily trans- 
formed into one for practical application. In case pi= .10, so that D is 
the distance between the 10th and 90th percentiles, this formula be- 
comes P.E.D = ^ D. Values for other inter-percentile ranges are 

given in column 4 of the table. 

Empirical calculation in the case of a normal distribution of the prob- 
able ewors of inter-percentile ranges in terms of their own magnitudes 
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(such quantities as 0.6001 above) shows that the most reliable D for a 
given percentage difference, i. e., for a given pa—pi magnitude, is to be 
obtained when equal proportions are cut off at each end of the distribu- 
tion, or, in other words, when p2 = l— Pi- Accepting this empirical 
finding as evidence of universal truth, the writer examined analytically 

P W 

ranges equally curtailed at the extremes. — — — is the function which 

is desired to make a minimum. Differentiating this and setting the 
derivative equal to and solving, results in the value pi = 0.06917 and 

P.E.D 0.59053 



D V7,, 



Accordingly, if a distribution is normal, the most 



reliable inter-percentile range is found to be the range between 6.917 
and 93.083. An examination of column 4 shows that the probable 
error of this range is not so much greater than that of the standard de- 
viation as to make it an unserviceable measure, especially for rapid in- 
vestigation or when percentiles must be calculated in any case. 

In place of this 7 to 93 percentile range, practically nothing of relia- 
bility is lost if a slightly different range is used. Because of the very 
fortunate constant 0.6001, and because of the general necessity of 
knowing both the 10th and 90th percentiles if any percentiles at all are 
needed, it is herewith pointed out that the 10 — 90 percentile range could 
to advantage be used in place of the semi-inter-quartile range or quartile 
deviation as a measure of dispersion. The relation in a normal dis- 
tribution between this range and the quartile deviation is expressed by 
the equation Q = 0.26315 D; between this range and the standard de- 
viation (7 = 0.39015 D. 

The properties of the 10—90 percentile range here reported are based 
upon a normal distribution, but an examination of the general formula 
for the probable error of an inter-percentile range suggests that this 
10 — 90 percentile range is superior to the quartile deviation for practi- 
cally all distributions. Comparatively it is more excellent for lepto- 
kurtic or peaked distributions, and sUghtly less excellent, though still 
much more reliable than the quartile deviation, for platykurtic or flat 
topped distributions. 

The 10—90 percentile range may readily be used as a basis for de- 
termining the existence of skewness and the nature of the kurtosis of a 
distribution. If the distribution is not skew, the average of the 10th 
and 90th percentiles will be equal to the median. The difference be- 
tween the average of these two percentiles and the median is thus a 
measure of skewness. The probable error of this difference is equal to 
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0.40412-7=. If the difference is small with reference to its prob- 

able error, skewness is not established. 
Further, for a mesokurtic distribution, Q = 0.26315D. Therefore, 

if - is < 0.26315, the distribution is leptokurtic, and if - is > 0.26315, 

the distribution is platykurtic. The probable error of — = — — 7=— 

D Vn 

The two formulas just given are based upon a normal distribution. 

Appendix. 

Derivation of formulas giving probable errors. 

Let <r equal the standard deviation of the measure designated by the 
subscript. 

0- without a subscript will stand for the standard deviation of the 
original distribution dealt with. 

Let p equal the proportion of cases below a certain value P of the 
variate; accordingly the value of the lOOp*^ percentile is P. 

Let q be defined by the usual relation p+q=l. 



LetQ = 



P7i-P25 



2 

Let i>ioop'-ioop = -Pioop~-Pioop' 
Let D without subscript = Dio_9o = P9o—^io- 
The following formulas are weU known: 

n 
_ _L '^npq, in which y^ is the ordinate of the distribution at 

P, the lOOp"" percentile. 



fpp' = y — ) in which p<p'. 

v'q 

Appendix A. 

The calculation of the probable error of the average deviation. 

The average deviation, or more exactly, the mean deviation, is the 
first moment of a distribution obtained by treating all the deviations 
from the mean of an original distribution as though positive. The 
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mean deviation is thus the mean of a distribution of form like the half 
of a normal distribution and its standard deviation is to be obtained in 
exactly the same manner as the standard deviation of any other mean. 



Accordingly, (t^d =\^ ^, in which Jii and Ji^ are the first and 

' ' ^ N 

second moments around an arbitrary origin. If moments around the 
stump of the haK distribution are taken, they are moments around an 
arbitrary origin so far as the haK distribution is concerned, though they 
are also moments around the mean of the total normal distribution. 

Accordingly ]it2 = <t^ and Jii = ■^^, in which j/o is the ordinate of the 

5iV 

normal curve at the mean. — as found in Sheppard's tables = 0.3989. 

Substituting and simplifying yields: 

0.6028(r 0.7555(4.D.) 



<^A.D. = 



P-E.A.D.= 



0.5096 (A .£>.) 



Appendix B. 
The calculation of the probable error of an inter-percentile range. 



<r-Dioop-ioop' = ^ ffp^ + ffp'^ - 2rpp,apcp, -y\—^-\- — -^ 0_ . 

^ {vY (yy yy 

If a normal distribution is assumed, each y is known when p is de- 
termined, and if n = 1 and (7 = 1, y then becomes the ordinate z recorded 
in Sheppard's tables. 

The reliability of a measure of dispersion is significant in terms of the 
magnitude of the measure, so that if the particular values of p and p' 

can be determined for which — ioop-ioop {g ^ minimum, the most reliable 

■^lOOp-lOOp' 

inter-percentile range will be found. Calculation of such ratios for a 
normal distribution shows that for a D of a given magnitude, or for a 
given (lOOp — lOOp') magnitude, the ratio is always a minimum when 
p = l—p'. The writer has not proved this analytically, but empirical 
results warrant the generalization. It thus remains to determine the 

value of p for which the function, /= — ioQp-ioop' ig a minimum. 

■Dioop-ioop' 
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Appendix C. 

The determination of the inter-percentile range having the minimal 
error. 

It is desired to determine the value of D for which the function just 
given is a minimum. If x equals a deviation from the mean, then, 
assuming a normal distribution, D = 2x, and the solution of the equation 

— = will give the value of x and therefore of D and p for the inter- 
na; 

percentile range having the minimal error. 



4 



npq np'q' 2npq' 

r^ t. (y'y yy' 



2x 



2t 



X 

fydx; n = l 



y is the value tabled as z for different a;'s and p's in Sheppard's tables. 
If equal amounts are cut off at each end of the distribution then: 
p = l—p'; p = q'; q = p'; a.nd y = y'; so that, 

\/2p-4p2 
2xy 



df d^2p-Ap^ d{2x) dy 

f "^2p-4p2 2x y 

dt) dv 

Noting that p< .5, that — =y, and that —=—xy, the differential 

dx dx 

equation becomes: 

^,J(l-4p)._1^1 

dx \-2p-Ap^ X J 

Setting this derivative equal to zero and solving by the aid of Shep- 
pard's tables yields p = 0.06917, so that the minimal error inter-per- 
centile range is that between the 6.917 and the 93.083 percentiles. 

Appendix D. 

The calculation of the probable error of the difference between the 
Pm and the average of the Pio and P90 percentiles. 

Let/=P5o-KAo+Pio) 

A/= APso - IAP90 - iAPio. 

2 (A/)* 
Squaring, summing, and as a first approximation setting <r/ = — — — 
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crp„2=^- — , etc., gives, 

N 

ff/ = apj + 4 cpm^ + z<rpj + 2^PnPn<rpK<rpa — '>'p^„(rp„(Tp„ — rpaPu<rp„(Tp„. 
SimpUfying, oy = 1.53567 -^ = 0.599143 -^ 

P.S.y= 0.40412-^. 

Appendix E. 
The calculation of the probable error of the quotient — . 

Let/-| 

df_dQ_dD 
f~ Q D 

2 2 2 

— = — + — —2rQn^^^, in which roo may be shown to =0.5. 
f Q^ D^ QD wu J 

Simplifying, 

<r/=0.277794=, or 

P.Kq =0.18736-^.. 
D vn 



